A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment
نویسندگان
چکیده
In today’s scenario, extraction–transformation– loading (eTl) tools have become important pieces of software responsible for integrating heterogeneous information from several sources. The task of carrying out the eTl process is potentially a complex, hard and time consuming. Organisations now –a-days are concerned about vast qualities of data. The data quality is concerned with technical issues in data warehouse environment. Research in last few decades has laid more stress on data quality issues in a data warehouse eTl process. The data quality can be ensured cleaning the data prior to loading the data into a warehouse. Since the data is collected from various sources, it comes in various formats. The standardization of formats and cleaning such data becomes the need of clean data warehouse environment. Data quality attributes like accuracy, correctness, consistency, timeliness are required for a Knowledge discovery process. The present state -of –theart purpose of the research work is to deal on data quality issues at all the aforementioned stages of data warehousing 1) Data sources, 2) Data integration 3) Data staging, 4) Data warehouse modelling and schematic design and to formulate descriptive classification of these causes. The discovered knowledge is used to repair the data deficiencies. This work proposes a framework for quality of extraction transformation and loading of data into a warehouse. General Terms: Data warehousing, data cleansing, quality data, dirty data
منابع مشابه
Enactment of Medium and Small Scale Enterprise ETL(MaSSEETL)-an Open Source Tool
Data quality is major concern area in an Data Warehouse environment. ETL tools focus on detection and correction of data quality problems that affect the success of a data warehouse. Data imported from source into the data warehouse often has different quality, format, coding etc. In order to bring all the data together in a standard, homogeneous environment, Extraction–transformation– loading ...
متن کاملAn Open Source ETL Tool - Medium and Small Scale Enterprise ETL(MaSSEETL)
In Data Warehouse (DW) environment, Extraction-Transformation-Loading (ETL) processes consumes up to 70% of resources. Data quality tools aim at detecting and correcting data problems that affect the accuracy and efficiency of data analysis applications. Source data imported into the data warehouse often has different quality, format, coding etc. In order to bring all the data together in a sta...
متن کاملNear Real Time ETL
Near real time ETL deviates from the traditional conception of data warehouse refreshment, which is performed off-line in a batch mode, and adopts the strategy of propagating changes that take place in the sources towards the data warehouse to the extent that both the sources and the warehouse can sustain the incurred workload. In this article, we review the state of the art for both convention...
متن کاملA Generic Procedure for Integration Testing of ETL Procedures
Testing is one of the key factors to any software products’ success and data warehouse systems are no exception. Data warehouse can be tested in different ways (e.g. front-end testing, database testing) but testing the data warehouse’s ETL procedures (sometimes called back-end testing [1]) is probably the most complex and critical data warehouse testing job, because it directly affects the qual...
متن کاملUclean: a Requirement Based Object- Oriented Etl Framework
Data warehouse is used to provide effective results from multidimensional data analysis. The accuracy and correctness of these results depend on the quality of the data. To improve data quality, data must be properly extracted, transformed and loaded into the data warehouse. This ETL process is the key to the success of a data warehouse. In this paper we propose a conceptual ETL framework for a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015